



FIG. 1



```

*bp = (*bp & rm[offset]) | ((cd<<offset) & mask) ← 50

1: const    tmp1,mask          ;load mask
2: consth   tmp1,mask
3: sll      tmp2,cd,offset   ;cd<<offset
4: and     tmp2,tmp1,tmp2    ;(cd<<offset) & mask
5: const    tmp1,_rm          ;base address rm
6: consth   tmp1,_rm
7: add     tmp1,tmp1,offset  ;address of rm[offset]
8: load    tmp1,(tmp1)        ;rm[offset]
9: load    bp,(&bp)           ;*bp
10: and    tmp1,bp,tmp1      ;*bp & rm[offset]
11: or     tmp1,tmp1,tmp2    ;final expression
12: store   tmp1,(&bp)         ;assign *bp

```

FIG. 3

| Stage              | Issue 1 | Issue 2 | Issue 3 | Issue 4 |
|--------------------|---------|---------|---------|---------|
| Write Back         | none    | none    | none    | none    |
| Execute            | 0x1000  | 0x1001  | 0x1002  | 0x1003  |
| Memory 1           | 0x1006  | 0x1007  | 0x1008  | none    |
| Memory 0           | 0x100A  | none    | none    | none    |
| Address Generation | 0x100B  | 0x100D  | 0x100F  | 0x1011  |
| Read Data          | none    | none    | none    | none    |
| Grouping           | 0x1013  | 0x1014  | 0x1015  | 0x1016  |
| Fetch/Decode       | 0x1017  | 0x1018  | 0x1019  | none    |

60

FIG 4



FIG. 5

Disassembly

469 cycles

TARGET DISASSEMBLED CODE

```

-- 0x1e0      : ldd    r2, a7, 0x1
-- 0x1e1      : mov    a6, a7
-- 0x1e2      : add    a6, 3
-- 0x1e3      : ldd    r4, a6
-- 0x1e4      : mov.e  a1, r2
-- 0x1e5      : mov.e  a0, r4
-- 0x1e6      : ld     r6, a0
-- 0x1e7      : st     r6, a1
-- 0x1e8      : ld     r6, a1
-- 0x1e9      : iadd.e r4, 1
-- 0x1ea      : std    r4, a6
-- 0x1ec      : iadd.e r2, 1
-- 0x1ed      : std    r2, a7, 0x1
-- 0x1ef      : cmp    r6, 0
-- 0x1f0      : bnez  0x1e1
-- 0x1f1      : ldd    a0, a7, 0x5
-- 0x1f2      : add    a7, 6
-- 0x1f3      : _FUNC_EXIT_memcpy:: ret
-- 0x1f4      : write_sdsp:: pushd r10, a7
-- 0x1f5

```

80

FIG. 6



FIG. 7